A Visually Enhanced Neural Encoder for Synset Induction

نویسندگان

چکیده

The synset induction task is to automatically cluster semantically identical instances, which are often represented by texts and images. Previous works mainly consider textual parts, while ignoring the visual counterparts. However, how effectively employ information enhance semantic representation for challenging. In this paper, we propose a Visually Enhanced NeUral Encoder (i.e., VENUE) learn multimodal task. key insight lies in construct representations through intra-modal inter-modal interactions among images text. Specifically, first design interaction module attention mechanism capture correlation To obtain multi-granularity representations, fuse pre-trained tags word embeddings. Second, masking filter out weakly relevant information. Third, present gating adaptively regulate modalities’ contributions semantics. A triplet loss adopted train VENUE encoder learning discriminative representations. Then, perform clustering algorithms on obtained induce synsets. verify our approach, collect dataset, i.e., MMAI-Synset, conduct extensive experiments. experimental results demonstrate that method outperforms strong baselines three groups of evaluation metrics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Modal Word Synset Induction

A word in natural language can be polysemous, having multiple meanings, as well as synonymous, meaning the same thing as other words. Word sense induction attempts to find the senses of polysemous words. Synonymy detection attempts to find when two words are interchangeable. We combine these tasks, first inducing word senses and then detecting similar senses to form word-sense synonym sets (syn...

متن کامل

Neural time course of visually enhanced echo suppression.

Auditory spatial perception plays a critical role in day-to-day communication. For instance, listeners utilize acoustic spatial information to segregate individual talkers into distinct auditory "streams" to improve speech intelligibility. However, spatial localization is an exceedingly difficult task in everyday listening environments with numerous distracting echoes from nearby surfaces, such...

متن کامل

Title: Neural Time Course of Visually Enhanced Echo Suppression 1 Running Title: Visually Enhanced Echo Suppression 2 3

27 Auditory spatial perception plays a critical role in day-to-day communication. For instance, 28 listeners utilize acoustic spatial information to segregate individual talkers into distinct auditory 29 “streams” to improve speech intelligibility. However, spatial localization is an exceedingly 30 difficult task in everyday listening environments with numerous distracting echoes from nearby 31...

متن کامل

Multi-channel Encoder for Neural Machine Translation

Attention-based Encoder-Decoder has the effective architecture for neural machine translation (NMT), which typically relies on recurrent neural networks (RNN) to build the blocks that will be lately called by attentive reader during the decoding process. This design of encoder yields relatively uniform composition on source sentence, despite the gating mechanism employed in encoding RNN. On the...

متن کامل

A Convolutional Encoder Model for Neural Machine Translation

The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. We present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT’16 EnglishRomanian translation w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2023

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics12163521